Unsupervised Feature Generation using Knowledge Repositories for Effective Text Categorization

نویسندگان

  • R. Rajendra Prasath
  • Sudeshna Sarkar
چکیده

We propose an unsupervised feature generation algorithm using the repositories of human knowledge for effective text categorization. Conventional bag of words (BOW) depends on the presence / absence of keywords to classify the documents. To understand the actual context behind these keywords, we use knowledge concepts / hyperlinks from external knowledge sources through content and structure mining on Wikipedia. Then, the features of knowledge concepts are clustered to generate knowledge cluster vectors with which the input text documents are mapped into a high dimensional feature space and the classification is performed. The simulation results show that the proposed approach identifies associated features in the text collection and yields an improved classification accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Feature Selection Methods for Text Mining

Text mining has been employed in a wide range of applications such as text summarisation, text categorization, named entity extraction, and opinion and sentimental analysis. Text classification is the task of assigning predefined categories to free-text documents. That is, it is a supervised learning technique. While in text clustering (sometimes called document clustering) the possible categor...

متن کامل

Acquisition of Common Sense Knowledge for Basic Level Concepts

Feature norms can be regarded as repositories of common sense knowledge for basic level concepts. We acquire from very large corpora feature-norm-like concept descriptions using a combination of a weakly supervised method and an unsupervised method. The success in identifying the specific properties listed in the feature norms as well as the success in acquiring the classes of properties presen...

متن کامل

Domain Kernels for Text Categorization

In this paper we propose and evaluate a technique to perform semi-supervised learning for Text Categorization. In particular we defined a kernel function, namely the Domain Kernel, that allowed us to plug “external knowledge” into the supervised learning process. External knowledge is acquired from unlabeled data in a totally unsupervised way, and it is represented by means of Domain Models. We...

متن کامل

Feature Generation for Text Categorization Using World Knowledge

We enhance machine learning algorithms for text categorization with generated features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies are further enriched by several orders of magnitude through controlled Web crawling. Prior to text...

متن کامل

Mapping Semantic Knowledge for Unsupervised Text Categorisation

Text categorisation is challenging, due to the complex structure with heterogeneous, changing topics in documents. The performance of text categorisation relies on the quality of samples, effectiveness of document features, and the topic coverage of categories, depending on the employing strategies; supervised or unsupervised; single labelled or multi-labelled. Attempting to deal with these rel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010